NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

CollabLLM: From Passive Responders to Active Collaborators

Wu, Shirley; Galley, Michel; Peng, Baolin; Cheng, Hao; Li, Gavin; Dou, Yao; Cai, Weixin; Zou, James; Leskovec, Jure; Gao, Jianfeng (July 2025, International Conference on Machine Learning)

Large Language Models are typically trained with next-turn rewards, limiting their ability to optimize for long-term interaction. As a result, they often respond passively to ambiguous or open-ended user requests, failing to help users reach their ultimate intents and leading to inefficient conversations. To address these limitations, we introduce COLLABLLM, a novel and general training framework that enhances multiturn human-LLM collaboration. Its key innovation is a collaborative simulation that estimates the long-term contribution of responses using Multiturn-aware Rewards. By reinforcement fine-tuning these rewards, COLLABLLM goes beyond responding to user requests, and actively uncovers user intent and offers insightful suggestions—a key step towards more humancentered AI. We also devise a multiturn interaction benchmark with three challenging tasks such as document creation. COLLABLLM significantly outperforms our baselines with averages of 18.5% higher task performance and 46.3% improved interactivity by LLM judges. Finally, we conduct a large user study with 201 judges, where COLLABLLM increases user satisfaction by 17.6% and reduces user spent time by 10.4%.
more » « less
Free, publicly-accessible full text available July 13, 2026
Iron isotope evidence in continental intraplate basalts for mantle lithosphere imprint on heterogenous asthenospheric melts

https://doi.org/10.1016/j.epsl.2023.118499

Xu, Rong; Lambart, Sarah; Nebel, Oliver; Li, Ming; Bai, Zhongjie; Zhang, Junbo; Zhang, Ganglan; Gao, Jianfeng; Zhong, Hong; Liu, Yongsheng (January 2024, Earth and Planetary Science Letters)

Full Text Available
Learning Customized Visual Models with Retrieval-Augmented Knowledge

https://doi.org/10.1109/CVPR52729.2023.01454

Liu, Haotian; Son, Kilho; Yang, Jianwei; Liu, Ce; Gao, Jianfeng; Lee, Yong Jae; Li, Chunyuan (June 2023, IEEE)

Full Text Available
GLIGEN: Open-Set Grounded Text-to-Image Generation

https://doi.org/10.1109/CVPR52729.2023.02156

Li, Yuheng; Liu, Haotian; Wu, Qingyang; Mu, Fangzhou; Yang, Jianwei; Gao, Jianfeng; Li, Chunyuan; Lee, Yong Jae (June 2023, IEEE)

Full Text Available
Tree Prompting: Efficient Task Adaptation without Fine-Tuning

https://doi.org/10.18653/v1/2023.emnlp-main.384

Singh, Chandan; Morris, John; Rush, Alexander; Gao, Jianfeng; Deng, Yuntian (January 2023, Association for Computational Linguistics)

Full Text Available
Generalized Decoding for Pixel, Image, and Language

https://doi.org/10.1109/CVPR52729.2023.01451

Zou, Xueyan; Dou, Zi-Yi; Yang, Jianwei; Gan, Zhe; Li, Linjie; Li, Chunyuan; Dai, Xiyang; Behl, Harkirat; Wang, Jianfeng; Yuan, Lu; et al (June 2023, IEEE)

Full Text Available
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone

Dou, Zi-Yi; Kamath, Aishwarya; Gan, Zhe; Zhang, Pengchuan; Wang, Jianfeng; Li, Linjie; Liu, Zicheng; Liu, Ce; LeCun, Yann; Peng, Nanyun; et al (October 2022, NeurIPS)

Vision-language (VL) pre-training has recently received considerable attention. However, most existing end-to-end pre-training approaches either only aim to tackle VL tasks such as image-text retrieval, visual question answering (VQA) and image captioning that test high-level understanding of images, or only target region-level understanding for tasks such as phrase grounding and object detection. We present FIBER (Fusion-In-the-Backbone-based transformER), a new VL model architecture that can seamlessly handle both these types of tasks. Instead of having dedicated transformer layers for fusion after the uni-modal backbones, FIBER pushes multimodal fusion deep into the model by inserting cross-attention into the image and text backbones to better capture multimodal interactions. In addition, unlike previous work that is either only pre-trained on image-text data or on fine-grained data with box-level annotations, we present a two-stage pre-training strategy that uses both these kinds of data efficiently: (i) coarse-grained pre-training based on image-text data; followed by (ii) fine-grained pre-training based on image-text-box data. We conduct comprehensive experiments on a wide range of VL tasks, ranging from VQA, image captioning, and retrieval, to phrase grounding, referring expression comprehension, and object detection. Using deep multimodal fusion coupled with the two-stage pre-training, FIBER provides consistent performance improvements over strong baselines across all tasks, often outperforming methods using magnitudes more data. Code is released at https://github.com/microsoft/FIBER.
more » « less
Full Text Available
Grounded Keys-to-Text Generation: Towards Factual Open-Ended Generation

https://doi.org/10.18653/v1/2022.findings-emnlp.547

Brahman, Faeze; Peng, Baolin; Galley, Michel; Rao, Sudha; Dolan, Bill; Chaturvedi, Snigdha; Gao, Jianfeng (January 2022, Proceedings of the Findings of the 2022 Conference on Empirical Methods in Natural Language Processing)

Full Text Available
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning

https://doi.org/10.18653/v1/2022.emnlp-main.388

Wang, Yaqing; Agarwal, Sahaj; Mukherjee, Subhabrata; Liu, Xiaodong; Gao, Jing; Awadallah, Ahmed Hassan; Gao, Jianfeng (January 2022, Association for Computational Linguistics)

Full Text Available
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models

Li, Chunyuan; Liu, Haotian; Li, Harold; Zhang, Pengchuan; Aneja, Jyoti; Yang, Jianwei; Jin, Ping; Hu, Houdong; Liu, Zicheng; Lee, Yong Jae; et al (January 2022, Neural Information Processing Systems (NeurIPS))

Full Text Available

« Prev Next »

Search for: All records